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Abstract 

The power-expected-posterior (PEP) prior developed for variable selection in 
normal regression models provides an objective, automatic, consistent and parsi¬ 
monious model selection procedure. At the same time it resolves the conceptual 
and computational problems due to the use of imaginary data. Namely, (i) it dis¬ 
penses with the need to select and average across all possible minimal imaginary 
samples, and (ii) it diminishes the effect that the imaginary data have upon the pos¬ 
terior distribution. These attributes allow for large sample approximations, when 
needed, in order to reduce the computational burden under more complex models. 
In this work we generalize the applicability of the PEP methodology, focusing on 
the framework of generalized linear models (GLMs), by introducing two new PEP 
definitions which are in effect applicable to any general model setting. Hyper prior 
extensions for the power-parameter that regulates the contribution of the imaginary 
data are further considered. Under these approaches the resulting PEP prior can 
be asymptotically represented as a double mixture of ^r-priors. Eor estimation of 
posterior model and inclusion probabilities we introduce a tuning-free Gibbs-based 
variable selection sampler. Several simulation scenarios and one real data example 
are considered in order to evaluate the performance of the proposed methods com¬ 
pared to other commonly used approaches based on mixtures of ^r-priors. Empirical 
results indicate that the GLM-PEP adaptations are more effective when the aim is 
parsimonious inference. 

Keywords: expected-posterior prior, g/hyper-g priors, generalized linear models, 
imaginary data, objective Bayesian model selection, power-prior 


1 Introduction 

1.1 Motivation 

In this article, the variable selection problem in Generalized Linear Models (GLMs) is 
analyzed from an objective and fully automatic Bayesian model choice perspective. The 
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desire for an automatic Bayesian procedure is motivated by the appealing property of 
creating a method that can be easily implemented in complex models without the need 
of specihcation of tuning parameters. Regarding the justihcation for the necessity of an 
objective model choice approach we can argue that in variable selection problems we are 
rarely conhdent about any given set of regressors as explanatory variables, which translates 
to little prior information about the regression coefficients. Therefore, we would like to 
consider default prior distributions, which in many cases are improper, thus leading to 
undetermined Bayes factors. 

1996a|[&) and expected-posterior (EP) priors 


Intrinsic priors 

(Berger and Pericchi, 

Perez and Berger 

2002 

) can be considered 


for model comparison in regression models. They are developed through the utilization of 
the device of “training” or “imaginary” samples, respectively, of “minimal” size and there¬ 
fore the resulting priors have a further advantage of being compatible across models; see 


Consonni and Veronese (2008). Intrinsic priors as well as EP priors have been proposed in 


many articles for use on the variable selection problem in the Gaussian linear models (see 
for example Casella and Moreno, 2006| ; however, to the best of our knowledge, there is 
only one study that proposes this methodology for GLMs, which is restricted to the case 
of the probit model (Leon-Novelo, Moreno and Gasella, 2012). We believe that this is due 


to the fact that derivation of such priors can be a very challenging task, especially under 
complex models, leading to computationally intensive solutions. Furthermore, by using 
minimal training samples, large sample approximations (e.g Laplace approximations) can 
not be applied in many cases. 

Our contribution with this article is two-fold. First, we develop an automatic, objective 
Bayesian variable selection procedure for GLMs based on the EP prior methodology. In 


particular we consider the power-expected-posterior (PEP) prior of Fouskakis, Ntzoufras 


and Draper (2015), that diminishes the effect that the imaginary data have upon the 


posterior distribution and therefore the need of using minimal training samples. Through 
this approach we can consider imaginary samples of sufficiently large size and therefore 
be able to apply, when needed, large sample approximations. Secondly, we introduce a 
simple tuning-free Gibbs-based variable selection sampler for estimating posterior model 
and variable inclusion probabilities. 


1.2 Bayesian variable selection for generalized linear models 

Despite the importance and popularity of GLMs, Bayesian variable selection techniques 
for non-Gaussian models are scarce in relation to the abundance of methods that are 
available for the normal linear model. This is mainly due to the analytical intractability 
which arises outside the context of the normal model. The relatively limited studies that 
focus on non-Gaussian models, mainly aim to overcome analytical intractability through 
the use of Laplace approximations and/or stochastic model search algorithms. 


Ghen and Ibrahim (2003) introduced a class of conjugate priors based on an initial 


prior prediction of the data (similar to the concept of imaginary data) associated with a 
scalar precision parameter. This approach essentially leads to a GLM analogue of the g 
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prior (Zellner and Siow, 1980, Zellner , [1986 ) where the precision parameter has the role 
of g. However, the prior of Chen and Ibrahim (2003) is not analytically available for non- 
Gaussian GLMs and, therefore, Ghen, Huang, Ibrahim and Kim (2008) proposed a Markov 


chain Monte Garlo (MGMG) based solution for this class of models. Ntzoufras, Della- 


portas and Forster (2003) used a unit-information ^f-prior (Kass and Wasserman, 1995) 


for variable selection and link determination in binomial models through reversible-jump 
MGMG sampling. Sabanes Hove and Held (2011) consider the asymptotic distribution 


of the prior of Ghen and Ibrahim (2003), which results in the same gf-prior form used in 


Ntzoufras et ah (2003), and further consider mixtures of ^f-priors along the lines of Liang, 


Paulo, Molina, Glyde and Berger (2008). Gomputation of the marginal likelihood in Sa¬ 


banes Bove and Held (2011) is handled through an integrated Laplace approximation. 


based on Gauss-Hermite quadrature, which allows variable selection through full enu¬ 
meration for small/moderate model spaces or through MGMG model composition (MG^) 
algorithms (Madigan and York, 1995) for spaces of large dimensionality. Other GLM 


variations of ^f-prior mixtures have an empirical Bayes (EB) flavor, using the observed 
or expected information matrix evaluated at the ML estimates as the prior variance- 
covariance matrix (Hansen and Yu, 2003, Wang and George, 2007, Li and Glyde, 2015). 
A computational beneht of the EB approach is that the integrated Laplace approxima¬ 
tion can be expressed in closed form as a set of functions of the ML estimates. For 
large model spaces, where full enumeration is infeasible, Li and Glyde (2015) recommend 
using the Bayesian adaptive sampling algorithm (Glyde, Ghosh and Littman, 2011). A 
relevant prior specification is the information-matrix prior of Gupta and Ibrahim (2009) 


which combines ideas from the ^f-prior and Jeffreys prior for GLMs (Ibrahim and Laud 


1991); under a Gaussian likelihood the information-matrix prior becomes the standard 


^f-prior, while for g ^ oo it reduces to Jeffreys prior which is proper only for the case of 
the binomial model. However, in applications Gupta and Ibrahim (2009) do not directly 
consider the problem of stochastic search over the entire model space. Finally, one appli¬ 
cation of Bayesian intrinsic variable selection for probit models via MGMG is presented 
in 


Leon-Novelo et al. (2012). 


As seen, at present most methods for GLMs are anchored to the ^f-prior approach 
and therefore cannot be regarded as objective and fully automatic approaches in the 
sense that one cannot conduct an analysis starting with non-informative, flat priors. In 
this work we present an automatic, objective Bayesian variable selection procedure for 
GLMs based on the PEP methodology. The structure of the remainder of the paper is as 
follows. In Section we provide an overview of the PEP prior formulation and discuss 
the applicability problems that arise in the case of non-Gaussian models. We proceed 
with two alternative dehnitions, which generalize the applicability of the PEP prior for 
GLMs. In Section|^we introduce a Gibbs-based sampler suitable for variable selection and 
for single-model posterior inference. Section presents an hierarchical extension of the 
methodology which involves assigning a hyper-prior to the power-parameter that controls 
the contribution of the imaginary data. Illustrative examples and comparisons with other 
methods using both simulated and real data sets are presented in Section Section 
concludes with a summary and a discussion of future research directions. 
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2 PEP priors for generalized linear models 

We consider n realizations of a response variable Y accompanied by a set of predictors 
Xi,X 2 , ...,Xp which may potentially characterize the response. To fix notation, let 7 G 
{ 0 , 1 }^ index all 2 ^ snbsets of predictors serving as a model indicator, where each element 
7 j, for j = 1 ,... ,p, is an indicator of the inclusion of Xj in the structure of model M-^. 
Moreover, let 7i denote the number of active covariates in model Within 

the GLM framework, the response Y follows a distribution which is a member of the 
exponential family. The sampling distribution of the response vector y = (|/i,... ^UnY' 
under model is given by 


= exp 



O'i ( 0 ')') 




2=1 


( 2 . 1 ) 


The functions a(-), b{-) and c(-) determine the particular distribution of the exponential 
family. The parameter is the canonical parameter which regulates the location of the 
distribution through the relationship = gob'~^ {v-yii)) > where g{-) is the link 

function connecting the mean of the response Hi with the linear predictor 
and g o is the inverse function oi g o = g{b'{'d^(^i^)'j. Commonly, a 

canonical d function is used, so that = g'y{i)- We assume that an intercept term is 

included in all 2 ^ models under consideration, so (3^ is the x 1 vector of regression 
coefficients, where + 1 , and X-y(j) is the i-th row of the n x d-^ design matrix 

with a vector of I’s in the first column and the 7 Mh subset of the Xj’s in the remaining 
p-y columns. The parameter controls the dispersion and the function a;(-) is typically of 
the form ai{(j)Y = where Wi is a known fixed weight that may either vary or remain 

constant per observation. In addition, the nuisance parameter (p~^ is commonly considered 
as a common parameter across models, therefore we assume throughout that <p^ = <p 
without loss of generality. Given the above formulation, we have that £(?/,) = b’{'d~^{i)) 
and Var(j/i) = 6"(d.^(j))Q!i(0). 


The GLM parameters 6^ = (/3 , cp) are divided into the predictor effects (3 and the 


parameter cp which affects dispersion. In the following we work along the lines of Fouskakis 


and Ntzoufras (2016) considering the conditional PEP prior; i.e. we construct the PEP 


prior of l3^ conditional on cp. 


2.1 An overview of the PEP prior 


The PEP prior, initially formulated in Fouskakis et al. (2015) for the case of the normal 


linear model, creatively fuses ideas from the power-prior (Ibrahim and Ghen, 2000) and 
the EP prior (Perez and Berger, 2002). Let us first describe the EP prior approach. Gon- 
sider that we have imaginary data y* = {yl,... ,yn*Y coming from the prior-predictive 
distribution m*{y*) of a “suitable” reference model M*. Then, given y*, for any model 
M-y with sampling distribution / 7 (y*|/ 3 -y, 0 ) as defined in ( 2 . 1 ) and a default baseline- 
prior of the form 7 r ^^(/3 , cp) = ti^{ f3Y(p)7ipP(cp), we have a corresponding baseline-posterior 
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distribution given by 




m^{y*) 


( 2 . 2 ) 


The EP prior for the parameters of model is then dehned as the posterior distribution 


in (2.2), averaged over all possible imaginary samples, i.e. 


irF(/3,,i^)= / ir”(/3 ..^|y')))i'(y*)dy* 


(2.3) 


The reference model M* is commonly considered to be the simplest model, i.e. the (null) 
intercept model in the regression framework. This selection makes the EP approach es¬ 


sentially equivalent to the arithmetic intrinsic Bayes factor of Berger and Pericchi (19966) 


A key issue in the implementation of the EP prior is the selection of the size n* 
of the imaginary sample. In order to minimize the effect of the prior on the posterior 
inference, the reasonable solution is to choose the smallest possible n* for which the 
posterior is proper. This leads to the concept of the so-called minimal training sample, 
which however requires calculating the arithmetic mean (or other appropriate measures 
of centrality) of Bayes factors over all possible minimal training samples. In addition, 
when it comes to regression the same problem arises with the design matrix as one has 
to choose appropriate covariate values for each minimal training sample, and this further 
depends upon the choice of the reference model. A computational solution to deal with the 
aforementioned problems has been proposed in the literature (e.g. Casella and Moreno 


2006, Moreno and Giron, 2008), however, this solution is only applicable under the normal 


linear regression model and in addition under this approach it is not clear whether the 
resulting Bayes factors retain their intrinsic nature. Furthermore, the effect of the EP 
prior can become influential when the sample size is not much larger than the number of 


predictors; see Fouskakis et ah (2015) for details. Finally, when n* is small and (2.3) is 


hard to derive, large sample approximations cannot be applied. 

The PEP prior resolves the problem of dehning and averaging over minimal training 
samples and at the same time scales down the effect of the imaginary data on the posterior 
distribution. The core idea lies in substituting the likelihood function involved in the 


calculation of (2.2) by a powered-version of it, i.e. raising it to the power of 1/5, similar to 


the power-prior approach of Ibrahim and Chen (2000). Following Fouskakis and Ntzoufras 


(2016), the conditional PEP prior in the GLM setup, under the null-reference model Mq, 


is dehned as follows 


<^^(/3.,,0|5)=<^^(/3J0,5)7r:"(0), 


(2.4) 


5 























where 


7rfP(/3^|0,5) 


j 7r^J(/37ly*,0,^)"io(y*l0)'^)dy*, 

/7(y1/3-y,0,<^)7r^(/3^ 10) 

m]^{y*\4>,6) 


/7(y1/3-y,0,^) 

"^0 (y1</>><^) 

fo{y*\f3o,<P,5) 


= J /7(y1/3-y,0,<5)vr^(/3^ |0)d/3^, 
= J /o(y1/5o,0,(5)7r^(/^o |0)d/3o, 

OC /o(y*|/3o,0)^^'^- 


(2.5) 

( 2 . 6 ) 

(2.7) 

( 2 . 8 ) 

(2.9) 

( 2 . 10 ) 


Note that the PEP prior for the intercept term of Mq essentially reduces to the baseline 
prior; i.e. 7rg^^(/3o|0, ^) = 7r^(/do|0). Here the power-parameter 6 controls the weight that 


The 


the imaginary data contribute to the “hnal” posterior distributions of (3^ and cj). 
default choice is to set it equal to the size of the imaginary sample, i.e. 6 = n*. Under this 
approach the contribution of the imaginary data is downweighted to overall account for one 
data point, leading to a minimally-informative prior with a unit-informat ion interpretation 


(Kass and Wasserman, 1995). Furthermore, by setting n* = n we avoid the complicated 


problem of sampling over numerous imaginary design sub-matrices, as in this case we 


have that X* = X^,. As shown in Fouskakis et al. (2015) the PEP prior is robust with 
respect to the specihcation of n* and it also remains relatively non-informative even when 
the model dimensionality is close to the sample size. 

Another advantage of setting n* = n, which becomes more obvious in the GLM frame¬ 
work, is that one can now utilize large-sample approximations when needed. For instance. 


consider the baseline-posterior in (2.6), which can be expressed as 


7r^(/37ly ,0,^) oc exp 2^-- ] 7r^{(3^\(j)). 


, 2 = 1 




( 2 . 11 ) 


This unnormalized distribution is recognized as the power-prior for GLMs (Ghen, Ibrahim 
and Shao, 2000|. If we assume a flat baseline prior for f3^, i.e. 7i^{f3^\(j)) oc 1, then, based 
on standard Bayesian asymptotic theory (Bernardo and Smith, 2000|[), for n* —>■ oo the 


distribution in (2.11) converges to 


( 2 . 12 ) 


where /3* is the MLE of [3^ for data y* and design matrix X*, and is the observed 

information evaluated at f3*. Specihcally, 


= (xyw;x;)-\ 


(2.13) 


6 



























where W1 = 


diag(w;(.)) with ^ ^nd /i^(i) = h'{d^^)). It 


is straightforward to see that the asymptotic distribution in (2.12) has a ^f-prior form ac¬ 


cording to the dehnitions for GLMs presented in Ntzoufras et ah (2003) and Sabanes Bove 


and Held (2011). The familiar zero-mean representation in (2.12) arises when the covari¬ 


ates are centered around their corresponding arithmetic mean and the imaginary response 
data are all the same, i.e. y* = where !„,* is a vector of ones of size n* since 

in this case we have that f3* = 0^ ; for details see 


Ntzoufras et al. 


(2003). 


2.2 


PEP prior extensions for GLMs via unnormalized power- 
likelihoods 


The sampling distribution of the imaginary data involved in the PEP prior via (2.6), (2.7) 


case 


and (2.9) is a power version of the likelihood function. In the normal linear regression 


Fouskakis et al. (2015) and Fouskakis and Ntzoufras (2016) naturally considered the 


density normalized power-likelihood 






(2.14) 


I My'\Syy'‘dy’' 

which is also a normal distribution with variance inflated by a factor of 6. Similar results 
can be derived for specihc distributions of the exponential family such as the Bernoulli, 
the exponential and the beta distributions where the normalized power-likelihood is of 
the same distributional form. This property simplihes calculations when using the PEP 
methodology, especially for Gaussian models where the resulting posterior distribution and 
marginal likelihood are available in closed form. An application of the PEP prior using 
the normalized power-likelihood for MGMG-based variable selection in binary logistic 


regression can be found in Perrakis, Fouskakis and Ntzoufras (2015a). 


However, this property does not hold for all members of the exponential family. For in¬ 
stance, for the binomial and Poisson regression models, the normalized power-likelihoods 
are composed by products of discrete distributions that have no standard identihable 
form. Although it is feasible to perform likelihood evaluations for each observation, the 
additional computational burden renders the implementation of the PEP prior methodol¬ 
ogy time-consuming and inefficient. One possible computational solution to the problem 


would be to utilize an exchange-rate algorithm for doubly-intractable distributions (Mur¬ 


ray, Ghahramani and MacKay, 2006). However, this approach would further increase 


MGMG computational costs. 

Here we pursue a more generic approach for the implementation of PEP methodology 
in GLMs by redehning the prior itself. Namely, we consider two adaptations of the PEP 
prior which, in principle, can be applied to any statistical model and, consequently, are 
applicable to all members of the exponential family. For the remainder of this paper, 
without loss of generality we restrict the scale parameter (j) to be hxed, which is the 
case for the binomial, Poisson and normal with known error variance regression models. 
Specihcally, we assume that 0 = 1 and remove 0 from all conditional expressions to 
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alleviate notation. 


The core idea is to use the unnormalized power likelihood (2.8) and (2.10) and nor¬ 


malize the baseline posterior density (2.11), i.e. 




J )df3„ 


(2.15) 


which is also the approach of Friel and Pettitt (2008, Eq.4) in the dehnition of the power 


posterior. Given this hrst step, we proceed by proposing two versions of the PEP prior 
which differentiate with respect to the dehnition of the prior predictive distribution used 


to average the baseline posterior in (2.15) across imaginary sets. This prior predictive 


distribution can be alternatively viewed as a hyper-prior assigned to y* (Fouskakis and 


Ntzoufras, 2016). More specihcally we dehne the two PEP variants as follows. 


Definition 1 The concentrated-reference PEP prior of model parameters (3^ is de¬ 
fined as the power-posterior of (3^ in (2.15) “averaged” over all imaginary data coming 


from the prior predictive distribution of the reference model Mq based on the actual like¬ 
lihood, that is 


vr. 


CR-PEP/ 
7 






m^(y* 


mN(y*|5) 

with mo (y*) = y'/o(y1/3o)vr^(/5o)d/3o 

and m^J(y*|(5) = [ /^(y*|/3 )d/3. 


f-,{y*\(3^y/^dy* (2.16) 
(2.17) 


In order for the above prior to exist we need to consider for each model M-y similar 


assumptions as in Perez and Berger (2002), i.e. 


0 < m!J(y*|(5) 


< oo, 0 < 


m, 


N 


(y* 


f(yl^) 


mt 


/7(y*l/3,)‘'‘dy* 


< OO. 


(2.18) 


In equation (2.16), will not necessarily be proper, but still, by abusing slightly no¬ 


tation we dehne the concentrated-reference PEP prior as the expectation of 7rr{(3 \y*, 6) 


with respect to m^. Furthermore, impropriety of the baseline priors in (2.16) causes no 
indeterminacy of the resulting Bayes factors, since 7r^^“^^^(/3..y|5) depends only on the 
normalizing constant of the baseline prior of the parameter of the null model. Finally, the 
concentrated-reference PEP prior for the parameter of the null model is no longer equal 
to the baseline prior 7r^(/3o), since 


TT, 


CR-PEP 

0 


(/3o|5) = TT^^ifdo 


m^(y*|h) 


/o(y1/5o)'/'dy* 


(2.19) 

























Definition 2 The diffuse-reference PEP prior of model parameters (3^ is defined 
as the power-posterior of in (2.15) “averaged” over all imaginary data coming from 
the “normalized” prior predictive distribution of the reference model Mq based on the 
unnormalized power-likelihood, that is 


vr. 


DR-PEP 
7 


"io(yl^) _ //o(y1/5o)^/'^7r^(/^o)d/3o 


with mo(y*|5) = 


/ (y* |(5)dy* / / /o(y* |/3o)^/'^<(/5o)d/3ody* 

and m!5?(y*|5) = [ f-,{y*\(3^y^^7r^{(3^)d(3^. 


The conditions for the existence of the diffnse-reference PEP prior, for each model 
are similar to (2.18), i.e. 


0 < mN(y*|5) < oo, 0< J 


< oo. 


( 2 . 21 ) 


Again the definition of the diffnse-reference PEP prior as an expectation of 7r^^(/3..y|y*, 5) 
with respect to is slightly abnsive nnder improper baseline prior setnps. The nor¬ 
malization of mQ(y*|5) is adopted in order to retain the “expected-posterior” inter¬ 
pretation nnder proper baseline prior setnps. The indnced normalizing constant Cq = 
J (y*l'^)dy* exists nnder any proper baseline prior setnp and has no effect on the pos¬ 
terior variable selection measnres since it is common in all models nnder consideration. 
Additionally, impropriety of the baseline priors canses no indeterminacy of the resnlting 
Bayes factors, since depends only on Cq which is common across all mod¬ 

els. Finally, the diffnse-reference PEP prior for the parameter of the nnll model is no 
longer eqnal to the baseline prior, since 


_DR-PEP 

^0 


(/do 15) 


N/a J fo{y*\fioY^^dy* 
^0 (Po) p 

(-0 


J/o(y1/3o)‘^V(ft)dy- 

///o(y|/5o)'''‘<(/3o)d//ody' 


( 2 . 22 ) 


Definition 1 is a special case of Definition 2 since rrig (y*) is a special case of mg (y*|5) 
with 5 = 1. Becanse the likelihood in (2.17) is not scaled down, it provides more informa¬ 
tion from the imaginary data resnlting in a more concentrated (in relation to the alterna¬ 
tive approach) predictive distribntion. For this reason, this version is named concentrated- 
reference PEP (CR-PEP). The CR-PEP prior is also given by 


vr. 


\(3J6) = <(/3J 





mN(y*|5) 


7ri"(/?o)dyM/?o. (2.23) 


In Definition 2 the likelihood involved in mg (y*|5) in ( 2.20[ ) is raised to the power of 1/5 
and, therefore, the information incorporated in the prior predictive distribntion becomes 
eqnal to n*/5 points leading to a distribntion which becomes increasingly diffnse as 5 
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grows. Thus, this prior is coined diffuse-reference PEP (DR-PEP). Specifically, we have 
that 


vr. 


DR-PEP 


(I3JS) = C„-V"(/3J 





m 


N 


(y*|(5) 


ttq (/?o)dy*d/?o- (2.24) 


2.3 Further prior specifications 

To complete the model formulation we need to specify a baseline prior for (3^, under 
each model ilT^, and also a prior distribution on the model space At = { 0 , 1 }^, as we are 
interested in variable selection rather than single-model inference in which case 7 G At 
is fixed. In addition, we do not need to specify a prior for 0, which is considered fixed 
in our setting. For models with random (under estimation) (j), we propose working along 


the lines of Fouskakis and Ntzoufras (2016) and use a fiat prior on 0; this will just add 


one 


additional step to the MCMC algorithm presented in Section 


Baseline prior distributions for f3^ 

Common choices for the baseline prior of the regression vector (3^^ under each model 
are either the fiat improper prior 

7r)J(/3^)cxl (2.25) 

which is of the form 

7 i !;{( 3 ^) oc |X^W^(/3^)X^|1/2 . (2.26) 

For non-Gaussian GLMs Jeffreys’ prior will depend on (3^ through the matrix W.y(-); 
see Section |2.1| for details. Note that Jeffreys’ prior for the parameter of the null model 
simplifies to 7r^(/3o) cc tr(Wo(/3o))^'^^- 


or Jeffreys’ prior for GLMs (Ibrahim and Laud, [iMl ) 


Prior distributions on model space 

A common non-informative option for 7 is to use a product Bernoulli distribution where 
the prior inclusion probability of each predictor is equal to 0.5. This leads to a discrete 
uniform prior on the model space, i.e. 


7r(7) = 2 


(2.27) 


An alternative choice, better suited for moderate to large p, is to use the hierarchical prior 
design 

7 |r ~ Bernoulli(r) and r ~ Beta(l, 1), 


in order to account for an appropriate multiplicity adjustment (Scott and Berger, 2010). 
In this case the resulting prior is given by 


7r(7) 


1 

p + 1 



-1 
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(2.28) 












3 Posterior Computation 

In normal linear regression models the conditional PEP prior is a conjugate normal-inverse 
gamma distribution which leads to fast and efficients computations ( [Fouskakis and 
zoufras, 2016). For non-Gaussian GLMs there exist no convenient conjugate distributions 


and the integrals involved in the derivation of the GR/DR-PEP priors are intractable. 
However, one can work with the hierarchical model, i.e. without marginalizing over the 
imaginary data, and use an MGMG algorithm in order to sample from the joint posterior 
distribution of [3^ and y*. 

For ease of exposition, for the remainder of this section we use indicator -0 to distin¬ 
guish between the GR-PEP prior (-0 = 1) and the DR-PEP prior (-0 = 5) and we simply 
use the general term “PEP” to denote the joint posterior. Specihcally, from (2.15), (2.16) 


and ( 2 . 20 ) we have the following hierarchical form 


(/37,y*|y,<5) oc /^(y|/3.^)7r (/3^|y*,5)mo(y 


oc 




mN(y*|(5) 


IN / ^ 

^0 (y 


(3.1) 


where m^(y*|l) = 'm^(y*). A further computational problem in (3.1) relates to the 
prior predictive distributions m^{y*\6) and rn^(y*|'^) which are not available in closed 
form. One solution is to use a Laplace approximation for both. Alternatively, a more 
accurate solution is to augment the parameter space further and include parameter f3o of 
the reference model Mq in the joint posterior, thus avoiding to approximate m^(y*|' 0 ). 
Based on (2.23) and (2.24) the posterior in (|3.1 ) is expanded as 


7r™^(/3 ,/?0,y1y,5) oc/^(y|/3 ) 




mt 


!(y|i5) 




(3,2) 


which leaves us with the need of using only one Laplace approximation for ml^(y*|(5). 
Sampling from (3.2) for a single model ilA,, i.e. for a hxed conhguration of 7 , is feasible 


using standard Metropolis-within-Gibbs algorithms. Note that under flat baseline priors 
the posterior in (3.2) and the corresponding MGMG scheme are simplified. Moreover, 


under a flat baseline prior one may also consider using the normal approximation in 


(2.12) for the entire fraction appearing in (3.2), instead of using a Laplace approximation 


for the prior predictive m^{y*\6). For variable selection, which is the topic of the next 


section, we further assign a prior on 7 , based on the options discussed in Section [2l^ and 
utilize the algorithm of Dellaportas, Forster and Ntzoufr^ (2002). 


3.1 Gibbs variable selection under the PEP prior 


The Gibbs variable selection (GVS; Dellaportas et ah, 2002 ) method is a stochastic search 
algorithm based on the vector of binary indicators 7 G {0,1}^ which represents which 
of the p covariates are included in a model. To formulate GVS we need to partition 
the regression vector /3 into corresponding to those components of /3 that are 
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included and excluded from the model, i.e. j3j G /3^ if 7 ^ = 1 and /3j G (3\^ if 'jj = 0, 
for j = 1,... ,p. As we assume that the intercept term is included in all models under 
consideration, /3^ and are of dimensionality d-y = p-^ + l and d\~f = p—p-y, respectively. 
Under the GVS setting the joint prior of /3 and 7 is specihed as follows 


ir(/3,7) = ir7/3)ir(7) = ir 7 / 3 ,)!r"(^\ 7 )"‘('l')' 


(3.3) 


where the actual baseline prior choice involves only since 7 rtj(/ 3 \..y) is just a pseudo-prior 


used for balancing the dimensions between model spaces; see Dellaportas et ah (2002) 
Suitable choices for the priors of (3.^ and 7 have been discussed in Section 2.3, thus, in 


order to complete the GVS setup, we only need to specify the pseudo-prior for the inactive 
part of the regression vector [3. In particular, we use a multivariate normal distribution 
of dimensionality d\.y, with parameters specihed by the ML estimates; namely. 


= (3.4) 

where (3^..^ and are the respective ML estimates and corresponding standard errors of 
f3\^.y from the full model using the actual data y and is the d\j x d\~f identity matrix. 
Based on this formulation, the full augmented posterior used to build our MGMG has the 
following form 


7r(/3.^,/3\.^,/3o,y*,7|y,^) oc /^(y|/3^ 




m^{y*\6) 


{(3^)7T.y (/3\.^)7r(7)7ro (/?o), 

(3.5) 

where, as a reminder, -0 = 1 in the GR-PEP setting and 'll) = S in the DR-PEP setting. 
Then, the proposed PEP-GVS sampling scheme is the following: 

Set starting values and y*^°^ 

For iterations t = 1, 2,..., V: 

Step 1 : Set current values equal to (3 = (3^^ f3o = (3^ 7 = 7 *-* and y* = 

Step 2: For j = 1 , 2 , ...,p, sample 7j ~ 7r(7j|/3,7^^-, y*, y, 5 ) which is a Bernoulli 
distribution. 

Step 3: Update f3 = {f3.y, 13^..^) based on the current conhguration of 7 . 

Step 4: Sample the active effects (3^ ~ 7 r(/ 3 ..^| 7 , y*, y, 5) using a Metropolis-Bastings 


step. 


Step 5: Sample the inactive effects from the pseudo-prior in (3.4). 

Step 6 : Sample from 7r(/3o|y*, V’) oc /o(y*|/do)^^’^7r^(/9o) using a Metropolis- 
Bastings step. 
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Step 7: Sample y* from 


7r(y1/3^,/3o,7,^,'0) oc 


Uy*\f3^Y/^fo{y*\f3oY/^ 

mN(y*|5) 


using a Metropolis-Hastings step. 

Step 8 : Update the parameter values at iteration t as = f3, = Yo = 7 

and = y*. 


Note that the generation of 7 and j3~y^ (Steps 2 and 5) is straightforward since the corre¬ 
sponding conditional distributions are of known form. For the rest of parameters, (3^, Yo 
and y*, we use Metropolis-Hastings (M-H) steps. Details are provided next. 


3.2 Implementation details 

Concerning the binary inclusion indicators 7 ^, the conditional posterior distribution 
'^{'lj\f3,j\j,y*,y, 5'j is a Bernoulli distribution with success probability 0^/(1 + 0j) and 


O, = 


4, 

4. (yi4,„) 


X 


fi,, (y1/3- 


30 


i/s 


X 


%,(/3) 7r(7^.J 


X 


X 


ttN (/3) mN (y*|5) 7 r( 7 . ) 


, (3.6) 


where 7 ^-^ = ( 7 ^- = l,7\j), 7jo = ( 7 i = 0 , 7 \,) and 7 r)^(/ 3 ) = 7 r^(/ 3 .^) 7 r(J(/ 3 \.^) for 7 e 
{ 7 ^ 1 , 7 jo}- All of the quantities involved in (3.6) are available in closed form expressions 
except of the marginal likelihood mj^(y*|(5). The latter is estimated through the following 
Laplace approximation 

= (2;ri7''"|X^W,(3;)X,|-‘''7^(y-[3;)'''V73;), (3.7) 


where (3* is the MLE for data y* given the configuration of 7 and 6 X^W-),(/3* )X. 


-1 


is equal to minus the inverse Hessian matrix evaluated at /3*. Under a Jeffreys baseline 
prior for f3^, the Laplace approximation simplifies to m!^(y*|J) = 

For the active effects (3^ of model M~^ and the intercept term Yo of fhe reference 
model Mo, we use independence sampler M-H steps. Specifically, for (3^ we generate new 
candidate values as 

/3;~g(/3;)=N,^(3^^E^an), 


where (3^^ is the ML estimate from a weighted regression on y^" = (y, y*)^, using weights 
wall _ and E^aii is the estimated variance-covariance matrix of /3^". The 

proposed move is accepted with probability 


= min 


A (y 1/37) 
’ A (y 1/^7) 


7rN(/3^)g(/3:y) 
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where /3^ denotes the current value of the chain. The proposal distribution of /do is 
?(/^o) = N(/do,'05'^g) with /do and a/j^, being the respective ML estimate of /do and the 

standard error of /do from the null model with response data y*. The proposed move is 
accepted with the usual M-H transition probability where the likelihood of the reference 
model is raised to the power of l/^’- Note that no specihc hne tuning is required for the 
proposal distributions of [3^ and /do- 

Finally, for the generation of the imaginary data we propose candidate values y*' from 
a proposal distribution q{y* ) and accept the proposed move with probability 

. . ( giy*) 


where the marginal likelihood estimates are obtained through (3.7) and y* denotes the 
current value of the chain. The joint proposal density is formed by the product of in¬ 
dependent distributions, i.e. qiy*) = YYi=i<i{yi)) where the proposal of each imaginary 
observation y* is constructed by combining the two likelihood components of the PEP 
prior. Hence, for the logistic regression model we use 


g(|/*) = Binomial(Nj, TT*) with vr* 


i/P 


.1/5 


l/p IS 

TTn TT 
0 


+ (1 - 7ro)Vh(l - 7r^(i))V'5' 


where tto = (1 -|- exp(—/do)) 7r-y(j) = (1 + exp(—X-y(j)/(3..^)) ^ and Ni denotes the number 

of trials of the observed data. Equivalently, for Poisson regression models we consider 


g(|/*) = Poisson 



1/5 

7(i) 


for the CR-PEP prior; where Aq = exp(/do) and = exp(X-y(j)/(3...^). For the DR-PEP 
prior, the corresponding choice of a Poisson proposal with mean (AoA-y)^'^^ was not found 
to be efficient in practice. Therefore, we use instead a Poisson random-walk proposal with 
mean equal to the value of y* at the current iteration. 

A complete and thorough description of the PEP-GVS algorithm as implemented in 
this work is provided in algorithmic form at the electronic appendix of this paper. 


4 Hyper-^ extensions 


The initial PEP prior for the normal regression model can be interpreted as a mixture of 
5 f-priors where the power parameter <5 is equivalent to g and the mixing density is the prior 


predictive of the reference model (Fouskakis et al., 2015). Thus, under the PEP approach 


we assign a hyper-prior on the imaginary data y*, rather than to the variance multiplier. 


i.e. the power-parameter <5. As discussed in Section |2.1[ the same representation holds 
asymptotically in the GLM setting given a flat baseline prior. From this perspective, a 
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natural extension of the PEP methodology arises by introducing an extra hierarchical level 
to the model formulation via the assignment of a hyper-prior on 6. Under this approach 
one can estimate the power-parameter instead of a-priori set it equal to a hxed predehned 
value. It should be noted, however, that when S is not fixed at n*, then PEP priors loose 
their unit-information interpretation. 

The hyper-(5 CR/DR-PEP priors can be approximately expressed as 

^CR/DR-PEP('^^^ _ y y 


under the baseline prior oc 1; where -0 = is the ML estimate given 

the imaginary data, is given in (2.13) and /Nd^(') denotes the (i-),-dimensional 

multivariate normal distribution. Sensible options for 7t{6) are the hyper-analogues 
proposed in Liang et al.| (|2008). Specihcally, we consider the hyper-5 prior 


Mi) = 


(4.2) 


which corresponds to a Beta(l, | — l) for the shrinkage factor Thinking in terms 


of shrinkage, Liang et al. (2008) propose setting a = 3 in order to place most of the 


probability mass near 1 or a = 4 which leads to a uniform prior. An alternative option is 
the hyper-5/u prior given by 


7r(5) = 



-a/2 


(4,3) 


In principle, any other prior from the related literature can be incorporated in the PEP 


design; for instance, the inverse-gamma hyper prior of 

Zellner and Siow 

(1980 

) or the 

recent ^f-prior mixtures proposed by 

Maruyama and George 

(2011 

) and 

Bayarri, Berger, 


Forte and Garcia-Donato (2012). 


Of course, when working outside the context of the normal linear model the integration 


in (4.1) with respect to S will not be tractable. Therefore, in order to incorporate the 


stochastic nature of 5 we need to introduce one additional MCMC sampling step. In this 
case the augmented posterior is given by 


7r(/3.y, /3o, y*, 7, ^|y) « 7r(/3^, /3\^, /?o, y*, 7|y, ^)7r(5). 


(4.4) 


where the hrst quantity in the right-hand side of (4.4) is given in (3.5). The corresponding 
full conditionals we wish to sample from are 


^CR OC 

7rDR“PEP(5|/3 ,/3o,0,7,y*,y) « 


mN(y*|5) 

m^{y*\6) 


(4.5) 

(4.6) 
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Looking at the above expressions, a subtle point is that S is not directly linked to the 
actual data y; however, it is linked indirectly via the posterior values of the parameters of 


models (for both approaches) and Mq (for the DR-PEP prior). Sampling from (4.5) 
or (4.6) is achieved by adding one simple step (after Step 7) in the PEP-GVS algorithm 
described in Section 3.1[ Specihcally, we use a random walk M-H step where we propose 
a candidate value 6' from 


q{S'\6) = Gamma ( j , 


which has mean equal to the current value 6 and variance s^. The latter is a tuning pa¬ 
rameter which can be specified appropriately in order to have an acceptance rate between 


The value of = h 


0.2 and 0.5, as recommended by Roberts and Rosenthal (2001). 
proved to be efficient in the examples presented in Section [5] Given this proposal, the 
new candidate S' is accepted with probability as = min(l. As), with As given by 


( 5 \ 


\ 5 ') 

./7(yl3;). 


{ V «} 




where ip' = 'ip = 1 for the GR-PEP prior and ip' = 6', 'ip = 6 for the DR-PEP prior. This 
acceptance probability is derived from the marginal likelihood Laplace approximation 


presented in (3.7), keeping only the terms that include 6. The analytic description of the 


PEP-GVS algorithm found in the electronic appendix includes the additional sampling 
step discussed here. 


5 Illustrative examples 

In this section we present first a simulation study for logistic and Poisson regression taking 
into account independent and correlated predictors as well as different levels of sparsity 
for the true model. We proceed with a simulation study for logistic models where the 
number of predictors is larger and the correlation structure is more complicated. The 
section concludes with a real data example for binary responses. 

In all illustrations we consider the GR-PEP and DR-PEP priors (introduced in Section 
2.2) and their hyper-(5 and hyper-(5/n extensions (presented in Secti on [Ij) w ith parameter 
a = 3, which is one of the main options proposed in Liang et al. (2008). For all PEP 
prior conhgurations we consider n* = n and X* = X.^, where the columns of the design 
matrix are centered around their corresponding sample means. For fixed 6 we consider 
the default unit-information approach, i.e. 6 = n*. Jeffreys’ prior for GLMs, given in 


(2.26), is used as baseline prior for f3. 


7 ' 


We compare the PEP variants with standard ^f-prior methods, using the GLM g- 


prior formulation of Sabanes Bove and Held (2011) for the parameters of the predictor 


variables and a flat improper prior for the intercept term. In particular, we consider the 
unit-information ^f-prior (with g = n) and three mixtures of ^f-priors; namely, the hyper-^f 
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and hyper-^f/n priors with a = 3 (Liang et al., 2008), and the beta hyper prior proposed by 


Maruyama and George (2011). Henceforth, the latter will be referred to as MG hyper-^f. 


Stochastic model search under these approaches is also implemented via GVS sampling. 


5.1 Simulation study 1 


In this hrst example we consider two simulation scenarios for logistic and Poisson regres¬ 
sion, presented in Hansen and Yu (2003) and Ghen et al. (2008), respectively. Both of 


these scenarios are also considered by Li and Glyde (2015). The number of predictors is 


p = 5 in the logistic model and p = 3 in the Poisson model, where each predictor is drawn 
from a standard normal distribution with pairwise correlations given by 

corr(Xj, Xj) = 1 < * < j < p. 

Goncerning the correlations between covariates we examine two cases: (i) independent 
predictors (r = 0) and (ii) correlated predictors (r = 0.75). In addition, four sparsity 
scenarios are assumed; the true data-generating models are summarized in Table [T} For 
the logistic case we use the same sample size as in Hansen and Yu (2003), namely n = 100, 
but with lower effects in order to reflect more realistic values of odds ratios. Given the 
coefficients in Table the odds ratios are approximately equal to 2, 2.5 and 3.5 for the 
sparse, medium and full models, respectively. For the Poisson simulation we use the same 


regression coefficients as in Ghen et al. (2008), but with sample size equal to n = 100 


Each simulation is repeated 100 times. 


Scenario 

/3o 

Logistic 

/3l f^2 

CO 

100) 

/?4 

/Ss 

Poisson 

/^O /^1 

(n = 
^2 

100) 

Pz 

null 

0.1 

0 

0 

0 

0 

0 

-0.3 

0 

0 

0 

sparse 

0.1 

0.7 

0 

0 

0 

0 

-0.3 

0.3 

0 

0 

medium 

0.1 

1.6 

0.8 

-1.5 

0 

0 

-0.3 

0.3 

0.2 

0 

full 

0.1 

1.75 

1.5 

-1.1 

-1.4 

0.5 

-0.3 

0.3 

0.2 

-0.15 


Table 1: Logistic and Poisson regression scenarios for Simulation Study 1 using indepen¬ 
dent (r = 0) and correlated predictors (r = 0.75). 


Since the number of predictors in both regression models is small, we assign a uniform 
prior on model space as given in (2.27). Results based on the frequency of identifying 
the true data-generating model through the maximum a-posteriori (MAP) model for the 
logistic regression simulation are summarized in Table The comparison between the 
PEP prior approaches versus the rest of the methods indicates the following: 


i) Overall the PEP based variable selection procedures perform well, since in 5 out of 
the 8 simulation scenarios the “best” prior for identifying the true model is at least 
one of the PEP priors. 


ii) The PEP procedures outperform all methods under the null and sparse simulation 
scenarios. 
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iii) Under the medium model scenario the PEP priors perform equally well to the rest 
of the methods in the case of independent predictors and slightly worse in the case 
of correlated predictors. 

iv) Under the full model scenario g-pnor methods perform better than PEP priors. 
This is no surprise as PEP priors tend to support more parsimonious solutions in 
general. 
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Posterior inclusion probability Posterior inclusion probability Posterior inclusion probability Posterior inclusion probability 


Logistic regression simulation: independent predictors 


Null model 



hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 8 
CR-PEP hyper 8/n 
DR-PEP 
DR-PEP hyper 8 
DR-PEP hyper 8/n 


Sparse model: Xi 



hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 8 
CR-PEP hyper 8/n 
DR-PEP 
DR-PEP hyper 8 
DR-PEP hyper 8/n 


Medium model: Xi +X2+X3 



hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 8 
CR-PEP hyper 8/n 
DR-PEP 
DR-PEP hyper 8 
DR-PEP hyper 8/n 


Full model: Xi +X2+X3+X4 + X5 



hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 8 
CR-PEP hyper 8/n 
DR-PEP 
DR-PEP hyper 8 
DR-PEP hyper 8/n 


Variables 


Figure 1: Posterior inclusion probabilities for Simulation Study 1 from 100 replicated 
samples of the null, sparse, medium and full logistic regression model scenarios with 
independent predictors (r = 0). 
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Logistic regression simulation: correlated predictors 


Null model 



hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 8 
CR-PEP hyper 8/n 
DR-PEP 
DR-PEP hyper 8 
DR-PEP hyper 8/n 


Sparse model: Xi 



hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 8 
CR-PEP hyper 8/n 
DR-PEP 
DR-PEP hyper 8 
DR-PEP hyper 8/n 


Variables 


Medium model: Xi +X2+X3 



hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 8 
CR-PEP hyper 8/n 
DR-PEP 
DR-PEP hyper 8 
DR-PEP hyper 8/n 


Full model: Xi +X2+X3+X4 + X5 



Figure 2: Posterior inclusion probabilities for Simulation Study 1 from 100 replicated 
samples of the null, sparse, medium and full logistic regression model scenarios with 
correlated predictors (r = 0.75). 

With respect to the comparison between the CR-PEP and DR-PEP priors we hnd no 
obvious differences between the two approaches for hxed 6 = n. Concerning the hxed 6 
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approach versus the hyper-5 and 5/n extensions, we see that under the DR-PEP approach 
the results are more or less the same in terms of MAP model success patterns. However, 
this is not the case under the CR-PEP approach as the hyper-5 prior support more complex 
models than the £xed-5 prior, while the hyper-5/n prior is somewhere in the middle. 
Interestingly, a similar pattern is observed among the ^f-prior and the hyper-^f, hyper- 
g/n priors. This pattern is identihed more clearly by examining the resulting posterior 
inclusion probabilities; the corresponding boxplots under each method and simulation 
scenario are presented in Figurefor the case of independent predictors and in Figure]^ 
for the case of correlated predictors. As we can see the DR-PEP design is quite robust 
with respect to the choice between fixed versus random 5. Also, within the category of 
^f-prior mixtures the MG hyper-prior seems to have the strongest shrinkage effect. 

The MAP-model results from the Poisson simulations are presented in Table Box- 
plots of posterior inclusion probabilities under each method and simulation scenario are 
presented in Figure (for independent predictors) and in Figure (for correlated pre¬ 
dictors). Overall, conclusions are similar to the logistic case. Specifically, looking at 
the differences between the PEP priors and all versions of gf-priors, we conclude to the 
following: 

i) The PEP procedures perform overall satisfactory; 6 out of the 8 best MAP success 
patterns are spotted by one of the PEP based methods. 

ii) The PEP procedures perform overall well under sparse conditions, i.e. under the 
null or the sparse models. 

hi) For the medium complexity scenarios, the hyper-^f and hyper-5 CR-PEP priors yield 
the best results; however, under the correlated predictors scenario the true model is 
rarely traced by any method. 

iv) For the full model with independent covariates, the MAP success rates under all 
methods are low; the hyper-gf has the highest rate but with the hyper-5 CR-PEP 
prior being close and rather competitive. For the full model with correlated covari¬ 
ates, all methods fail; the hyper-5 CR-PEP prior has the highest success rate which 
is only 3%. 

With respect to the various PEP prior distributions, the comparison in the Poisson case 
leads to the same hndings as in the logistic regression case. Again, the most interesting 
Ending is that inference under the DR-PEP prior is not afiected by the choice of fixed 
versus random 5. On the contrary, this is not the case for the CR-PEP prior, where the 
hyper-5 extension systematically supports more complex models. To a lesser extend the 
same holds for the CR-PEP hyper-5/n prior. 

As a final remark, we note that all priors yield lower MAP success rates under the 
null scenario of the logistic simulations compared to the corresponding rates observed in 
the Poisson simulations. On the other hand, under the full scenario, the MAP success 
rates are higher in the logistic simulations. This can be attributed to the fact that the 
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regression coefficients in the Poisson simulation are smaller in absolute value than the 
corresponding coefficients of the logistic formulation; see Table 


Poisson regression simulation: independent predictors 


Null model 



hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 6 
CR-PEP hyper 6/r7 
DR-PEP 
DR-PEP hyper 6 
DR-PEP hyper 6/r? 


Sparse model: Xi 



hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 6 
CR-PEP hyper 6/r? 
DR-PEP 
DR-PEP hyper 6 
DR-PEP hyper 6/r? 


Medium model: Xi-t-X2 


QliPi nj 



UIIU 

i!H ifll. 

T i 

31iSBs 




Variables 


Full model; Xi-i-X2+X3 


OHIO 


i M n i I 1 1 ! 11 ! M 

_i 1 ^ I i _ 

* _L "" i 

* ° s -h 

'pN niiy 

“i| olyiBgi 


"x^ x^ ^ 


Variables 


Figure 3: Posterior inclusion probabilities for Simulation Study 1 from 100 replicated 
samples of the null, sparse, medium and full Poisson model scenarios with independent 
predictors (r = 0). 
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Posterior inclusion probability Posterior inclusion probability Posterior inclusion probability Posterior inclusion probability 


Poisson regression simulation: correlated predictors 


Null model 


^8 ■> -y , ' T 


isi : ‘gi' ■ ; ion ian ; ; iBI i y 

t 1 ■ ; - ' ™ 4- - 1 + ; ™ ^ A i g 4- 4- : f i ■ ' ^ ^ 

^ T PPh 1 ? t ' 1 s- PBb 

Xi X, Xp 

11 i 


Variables 


hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 6 
CR-PEP hyper 6/r? 
DR-PEP 
DR-PEP hyper 6 
DR-PEP hyper 6/r? 


Sparse model: Xi 


nllir 

: ! ! ! ! » » * □ * 

° 0 ! 1 ] ! ° • ° 

It: ' ^ , I t ° ° 

Wc 

•• ^. ■ 

^m. 

Hb : g- Ti:i; 

jlQBg g.flaiBpgg 


Variables 

Medium model: X1+X2 


hyperg 
hyperg/n 
MG hyper g 
CR-PEP 
CR-PEP hyper 6 
CR-PEP hyper 6/r? 
DR-PEP 
DR-PEP hyper 6 
DR-PEP hyper 6/r? 



Variables 


Full model; X1+X2+X3 


flNffli Ml 

y- 4- » 

; ; 1 ; 1 » ® t 

B ; : 1 A ' 

3 0 8 

T 4- 1 ” “ 

i ; i “ 1 

PiflDifQdl pB||y 

i'ii 

IflliagH 


"x^ x^ ^ 


Variables 


Figure 4: Posterior inclusion probabilities for Simulation Study 1 from 100 replicated 
samples of the null, sparse, medium and full Poisson model scenarios with correlated 
predictors (r = 0.75). 
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5.2 Simulation study 2 


In this illustration we consider a more sophisticated scenario with p = 10 potential pre¬ 


dictors (1024 models) and a more intriguing correlation structure. Similar to Nott and 


Kohn (2005), the hrst hve covariates are generated from a standard normal distribution, 


while the remaining hve covariates are generated from 


^ij — A^(0.3Xji -|- 0.5Xj2 -l- 0.7Xj3 -|- 0.9Xj4 -|- l.lXjs, 1), 

for i = 1,..., n and j = 6 ,..., 10. We assume that sample size n is 200 and consider 
the three logistic regression data-generating models which are summarized in Table the 
resulting odds ratios for the sparse and dense simulation models are approximately equal 
to 2 and 3, respectively. Each simulation is repeated 100 times. 


Scenario 

/3o 


^2 

Logistic 

/^S /^4 

[n = 

200) 

1^6 

^7 

Ps 

^9 

Pio 

null 

0.1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

sparse 

0.1 

0 

0 

-0.9 

0 

0 

0 

1.2 

0 

0 

0.4 

dense 

0.1 

0.6 

0 

-0.9 

0 

1 

0.9 

1.2 

-1.2 

-0.5 

0 


Table 4: Three logistic simulation scenarios for Simulation Study 2. 


In this example we use the beta-binomial prior (2.28) with a Beta(l,l) mixing dis¬ 
tribution; see Scott and Berger (2010). The comparison that follows is based on the 
posterior inclusion probability of each covariate. Figures and present boxplots of 
the posterior inclusion probabilities from the 100 simulated data sets for the null, sparse 
and dense simulation scenarios, respectively. 

Under the null scenario, all priors, except the hyper-p prior, exhibit strong shrinkage 
on the inclusion probabilities. Under the hyper-p prior relatively large posterior inclusion 
probabilities with high variability across different samples are obtained. The hyper-(5 
CR-PEP prior also induces more variability, however, the resulting inclusion probabilities 
under this method are quite lower in comparison to those obtained under the hyper-^f 
prior. 

Under the sparse simulation scenario (true model: X1-I-X7-I-X10), there are no striking 
differences among methods. All priors provide very strong support for the inclusion of X7 
and sufficient support for the inclusion of X3, although the variability under PEP priors 
is larger for the latter variable. Moreover, all methods yield very wide posterior inclusion 
probability intervals for predictor Xio, implying high posterior uncertainty concerning 
the inclusion of this variable. For the non-important variables we observe that the £xed-(5 
CR-PEP and the DR-PEP priors yield the lowest posterior inclusion probabilities. 

Finally, in the dense simulation scenario (Figure]^, where the true model is X1-I-X3-I- 
X5 -|- Xg -|- X7 -|- Xg -|- Xg, the £xed-(5 PEP priors generally outperform other methods in 
terms of providing low posterior inclusion probabilities for the insignihcant covariates Xg, 
X4 and XiQ. The ^f-prior and the hyper DR-PEP extensions yield similar posterior inclu¬ 
sion probabilities and generally perform well, however, they introduce some uncertainty 
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■ g prior 

■ hyperg 

■ hyperg/n 

■ MG hyper g 

■ CR-PEP 

■ CR-PEP hyper S 

■ CR-PEP hyper 8//7 

■ DR-PEP 

■ DR-PEP hyper 5 

■ DR-PEP hyper 8/r? 


X, 


X2 X3 X4 


X5 


Xe X7 


Xs 


X9 


X10 


Variables 


Figure 5: Posterior inclusion probabilities for Simulation Study 2 under the various priors 
from 100 repetitions of the null logistic simulation scenario. 



X-| X2 X3 X4 X5 Xe X7 X3 Xg X-jo 

Variables 


Figure 6 : Posterior inclusion probabilities for Simulation Study 2 under the various priors 
from 100 repetitions of the sparse logistic simulation scenario where the true model is 
X3 + X7 + XlQ. 

concerning the inclusion of covariate X 4 . The rest of the methods systematically support 
more complex models as they provide elevated support for the inclusion of variables X 2 
and X 4 . 
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Xs Xe X7 

Variables 



I 9 prior 
I hyper g 
I hyper g/n 
I MG hyper g 
I CR-PEP 
I CR-PEP hyper 5 
I CR-PEP hyper 6/n 
I DR-PEP 
I DR-PEP hyper 5 
I DR-PEP hyper 8/n 


Figure 7: Posterior inclusion probabilities for Simulation Study 2 under the various priors 
from 100 repetitions of the dense logistic simulation scenario where the true model is 
Xi + X3 + X5 + Xq + Xy + dfg + Xg. 


5.3 A real data example 


In our last example we consider the Pima Indians diabetes data set ( 

Ripley, 1996), which 

has been analyzed in several studies (e.g. 

Holmes and Held 

2006 

Sabanes Bove and Held 


2011). The data consist of n = 532 complete records on diabetes presence (present=l, not 
present=0) according to the WHO criteria for signs of diabetes. The presence of diabetes 
is associated with p = 7 potential covariates which are listed in Table 

For each method we used 41000 iterations of the GVS algorithm, discarding the hrst 


1000 as burn-in period. We assigned a beta-binomial prior on model space (see Eq. 2.28) 


Table shows the posterior inclusion probabilities of each covariate under the various 
methods. For comparison with the results presented in Sabanes Bove and Held (2011), 
we also include in Table the resulting posterior inclusion probabilities from the Zellner 


and Siow (1980) inverse gamma (ZS-IG) prior, the hyper-g/n with a = 4, and a non- 
informative inverse gamma (NI-IG) hyper-g prior with shape and scale equal to 10“^. As 
seen, the posterior inclusion probabilities that we obtain from the GVS algorithm are in 


agreement with the results presented in Sabanes Bove and Held (2011). 


For the covariates Xi,X 2 ,X 5 and Xg, which seem to be highly influential, the results 
in Table 1^ show no signihcant differences among methods. On the contrary, the posterior 
inclusion probabilities for the “uncertain” covariates X 3 ,X 4 and X 7 vary substantially; 
specihcally, the inclusion probabilities from the £xed-(5 GR/DR-PEP priors, the hyper- 
6/n DR-PEP prior and the g-prior are considerably lower than the inclusion probabilities 
resulting from the rest of the methods. In terms of the shrinkage factors g/(g -|- 1) and 
S/{6 -|- 1 ), results show that the shrinkage effect is stronger when g or 5 is fixed, which 

























































































































































































Covariate 

Description 


Number of pregnancies 

X 2 

Plasma glucose concentration (mg/dl) 

X 3 

Diastolic blood pressure (mm Hg) 

X 4 

Triceps skin fold thickness (mm) 

X 5 

Body mass index (kg/m^) 

Xe 

Diabetes pedigree function 

X 7 

Age 


Table 5: Potential predictors in the Pima Indians diabetes data set. 


leads to a drastic reduction in the effects (and the inclusion probabilities) of low-influential 
covariates. On the other hand, the priors with random g or 6 clearly result in higher 
posterior inclusion probabilities. Among this category of priors, the hyper-^/n DR-PEP 
is evidently the most parsimonious, as it yields posterior inclusion probabilities which are 
actually quite close to those obtained from fixed 6 PEP priors. 

The uncertainty of the estimated posterior inclusion probabilities, for the standard 
methods considered in the previous examples, is depicted in Figure where we present 
the corresponding boxplots produced by splitting the posterior samples into 40 batches 
of size 1000. As seen in Figure stochasticity in g and 6 mainly affects the posterior 
inclusion probabilities of the “uncertain” covariates X 3 , X 4 and X 7 . For these variables the 
extra prior uncertainty induces higher posterior variability, as expected, and consequently 
larger Monte Carlo errors. Apart from this, we observe the same patterns of evidence as 
in Sections 5.1 and 5.2 Among the mixtures of ^f-priors, the MG hyper-^f prior induces 


Method 


^2 

Predictor 

X3 X4 X5 

^6 

X 7 

ZS-IG hyper-^f 

0.961 

1.000 

0.252 

0.250 

0.998 

0.994 

0.530 

NI-IG hyper- 5 ( 

0.967 

1.000 

0.349 

0.341 

0.998 

0.996 

0.622 

^f-prior {g = n) 

0.952 

1.000 

0.136 

0.139 

0.998 

0.992 

0.382 

hypei-g (a = 3) 

0.970 

1.000 

0.397 

0.379 

0.998 

0.996 

0.669 

hyper-g/n (a = 3) 

0.966 

1.000 

0.304 

0.300 

0.998 

0.995 

0.579 

hyper-g/n (a = 4) 

0.965 

1.000 

0.307 

0.299 

0.997 

0.995 

0.582 

MG hyper-gf 

0.958 

1.000 

0.262 

0.259 

0.998 

0.994 

0.548 

GR-PEP 

0.948 

1.000 

0.100 

0.104 

0.998 

0.987 

0.339 

GR-PEP hyper-5 

0.964 

1.000 

0.296 

0.291 

0.998 

0.995 

0.602 

GR-PEP hyper-5/n 

0.956 

1.000 

0.223 

0.225 

0.998 

0.992 

0.520 

DR-PEP 

0.948 

1.000 

0.102 

0.104 

0.997 

0.988 

0.324 

DR-PEP hyper-5 

0.954 

1.000 

0.174 

0.173 

0.997 

0.991 

0.442 

DR-PEP hyper-5/n 

0.951 

1.000 

0.125 

0.120 

0.998 

0.987 

0.346 


Table 6 : Posterior inclusion probabilities for the seven covariates of the Pima Indians 
data set. 
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the strongest effect on posterior inclusion probabilities, followed by the hyper-^f/n and 
the hyper-^f priors. Similarly, posterior inclusion probabilities under the hyper-5/n PEP 
priors are lower than those resulting from the hyper-5 PEP priors. In addition, the DR 
design leads to a more stringent control for inclusion of variables in relation to the CR 
prior setup. 

Figure [^depicts the convergence and the estimated posterior distribution of the shrink¬ 
age parameter 5/(1 -|- 5) under the four PEP hyper-prior approaches. The posterior his¬ 
tograms are indicative of the behavior of the shrinkage parameter. Comparison between 
the hyper-5 (Figure]^) and the hyper-5/n (Figure]^) approaches shows that the poste¬ 
rior distribution of the shrinkage parameter under the latter priors is more concentrated 
to values close to one, thus, resulting to a stronger shrinkage effect. Also, the histograms 
in Figures and indicate that the posterior distributions of the shrinkage parameter 
under DR-PEP are more concentrated to one in comparison to the corresponding poste¬ 
riors under CR-PEP. Note that the shrinkage under the fixed-5 approaches is constant, 
equal to 0.998, which leads to considerably lower posterior inclusion probabilities as seen 
in Table and Figure 

We conclude this example by examining the out-of-sample predictive accuracy of the 
various prior setups. We randomly split the data set in half in order to create a training 
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Figure 8: Boxplots of batched estimates of the posterior inclusion probabilities for the 
seven predictors in the Pima Indians data set based on 40 batches of size 1000. 
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sample and a test sample and identified the corresponding MAP and the median prob¬ 
ability models, under all methods, from the training data set. Then, based on posterior 
samples from the predictive distribution we calculated the averages of false positive and 
false negative percentages for the test data set; the results are reported in Table Over¬ 
all, we cannot say that there is dominant method in terms of predictive accuracy as the 
predictions are more or less the same across the prior setups. We may note however that 
the most complex MAP model arises from the hyper-^f prior which also results in the high¬ 
est false negative prediction rates. Also, the unit-information gf-prior, the CR-PEP prior 
with fixed 6 , and the DR-PEP priors lead to the most parsimonious median probability 
model. This model is comparable in terms of predictive performance with the model that 
further includes Xj, which is indicated as the median probability model by the rest of the 
methods. 


CR-PEP hyper-5prior « DR-PEP hyp6r-5prior 



CR-PEP hyper-5/nprior ^ DR-PEP hyp6r-5/nprior 



MCMC iterations 


MCMC iterations 


MCMC iterations 


MCMC iterations 




10000 20000 30000 40000 



0 10000 20000 30000 40000 


I i jnw|]yr^ 


10000 20000 30000 40000 


10000 20000 30000 40000 


MCMC iterations 


MCMC iterations 


MCMC iterations 


MCMC iterations 



0.88 0.92 0.96 1.00 0.975 0.985 0.995 

8 /( 1 + 8 ) 5 /( 1 + 8 ) 



0.92 0.94 0.96 0.98 1.00 

5 /( 1 + 8 ) 


0.980 0.985 0.990 0.995 1.000 
5 /( 1 + 5 ) 


(a) Hyper-(5 Priors 


(b) Hyper-(5/n Priors 


Figure 9: Ergodic mean plots, time-series plots and histograms of the shrinkage factor 
5/{I + 5) for the hyper-b and hyper-b/n PEP priors based on 40000 draws. 
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Method 

MAP 

False 

Neg. 

(%) 

False 

Pos. 

(%) 

MPM 

False 

Neg. 

(%) 

False 

Pos. 

(%) 

5 -prior {g = n) 

Ma 

10.8 

16.5 

Ma 

10.8 

16.5 

hyper -5 (a = 3) 

■A4.A + A 3 -|- A 4 -|- Xj 

11.4 

16.9 

Ma + Ay 

11.1 

16.8 

hyper- 5 /n (a = 3) 

Ma 

11.0 

16.6 

Ma + Ay 

11.0 

16.6 

MG hyper -5 

Ma 

10.9 

16.6 

Ma + Ay 

10.9 

16.6 

CR-PEP 

Ma 

10.9 

16.9 

Ma 

10.9 

16.9 

CR-PEP hyper-(5 

Ma 

10.9 

17.0 

Ma + Ay 

11.3 

16.4 

CR-PEP hyper-(5/n 

Ma 

10.8 

17.0 

Ma + Ay 

11.0 

16.6 

DR-PEP 

Ma 

10.9 

16.8 

Ma 

10.9 

16.8 

DR-PEP hyper-(i 

Ma 

10.9 

16.9 

Ma 

10.9 

16.9 

DR-PEP hyper-d/n 

Ma 

10.9 

16.8 

Ma 

10.9 

16.8 


A4a ■ + ^2 + ^5 + 


Table 7: Percentages of false negative and false positive detections for the Pima Indian 
data set under the MAP model and median probability model (MPM) for the various 
priors. 


6 Discussion 


In this paper we presented an objective, automatic and compatible across competing 
models Bayesian procedure with applications to the variable selection problem in GLMs. 
Specihcally we extended the PEP prior formulation through the use of unnormalized 
power-likelihoods and dehned two new PEP priors, called CR-PEP and DR-PEP, which 
differentiate with respect to the dehnition of the prior predictive distribution of the ref¬ 
erence model. Under the new dehnitions, the applicability of the PEP methodology is 
signihcantly enhanced. Although that we focused on variable selection for GLMs, the 
GR/DR-PEP priors proposed here may in principle be used for any general model set¬ 
ting. At the same time the new approaches retain the desired features of the original prior 
formulation; specihcally, i) they resolve the problem of selecting and averaging across min¬ 
imal imaginary samples, thus, also allowing for large-sample approximations, and ii) they 
are minimally informative as they scale down the effect of the imaginary data on the 
posterior distribution. We further studied the assignment of hyper-prior distributions to 
the power-parameter 5 that controls the contribution of the imaginary data. Following 


the hyper-gf and g/n priors proposed in Liang et al.| (2008), we effectively introduced the 
hyper-(5 and 5/n analogues. 

The empirical results presented in this paper suggest that the proposed PEP priors 
outperform mixtures of ^f-priors in terms of introducing larger shrinkage to the inclusion 
probabilities of non-inhuential or partially inhuential predictors, thus, leading to more 
parsimonious solutions with comparable predictive accuracy. When comparing PEP pri¬ 
ors with hxed {6 = n) and random 6 the results indicate that the former approach induces 
more stringent control in the inclusion of predictors. Therefore, hxed PEP priors support 
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simpler models which is a desirable feature when the number of covariates is large. Con¬ 
cerning the choice between the CR and the DR prior setups, we conclude in favour to the 
use of the latter since it is rather robust with respect to the fixed vs. random specihcation 
of 

In the near future, we aim to investigate further the theoretical properties of the 
PEP extensions. So far we have proofs, which are available in an earlier technical report 
(Perrakis, Fouskakis and Ntzoufras, 20156), that both extensions result in model selec¬ 
tion consistency for the case of the Gaussian linear model. We intend to extend this 
work within the formal GLM framework, possibly overcoming the problem of analyti¬ 
cal intractability through Laplace-based approximations. In addition, we are currently 
working on extensions of the PEP methodology to high-dimensional problems, that in¬ 
clude the small n-large p case, by incorporating shrinkage priors (e.g. ridge and LASSO 
procedures) into the PEP design. Finally, another promising alternative is to embody 
the expectation-maximization variable selection approach of Rockova and George (2014) 
within the PEP prior. 
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Electronic Appendix for the the paper entitled “Power- 
Expected-Posterior Priors in Generalized Linear Mod¬ 
els’” by D.Fouskakis, I.Ntzoufras and K.Perrakis. 


The PEP-GVS algorithm 


Given the posterior distribution in Eq. |3.5[ with ip = 1 for the CR-PEP prior and ip = 5 
for the DR-PEP prior, the PEP-GVS sampler proceeds as follows: 

A. Set starting values and For hxed 5 set 5 = n, for 

random 5 set starting starting value 


B. For iterations f = 1, 2,A: 

Step 1: Sampling of for j = 1,2, given the current state of /3^, 7 ^^, y* 
and S. 


(a) Galculate the MLEs under 7 ^-^ = ( 7 ^ = 1 , 7 ^^), = {■jj = 0 , 7 ^^) and 

compute the Laplace approximations (y*!*^), (y*|<^) through Eq. 

K7\ 

(b) Evaluate the odds: 


O, = 


G hiy.) 

G(yiA„) 


,4o(y1/3, 




X 


(y1<^)7r(7 


(c) Sample 7 '- 


Bernoulli 




o, 


1+07 


and set = 7 ’- with probability equal to 1 . 


Step 2: Update /S*-* 



based on the current conhguration of 7 . 


Step 3: Sampling of 0^^ given the current state of 7 , y* and 5. 

(a) Generate [3^ from the proposal distribution q{l30 = E^aii), where 

is the ML estimate from a weighted regression on y^" = (y, y*)^, using 
weights = ( 1 „, and S^aii is the estimated variance-covariance 

P 7 

matrix of 

(b) Galculate the probability of move: 


My\P'0 

( Mr\P'0 ) 

-J ' 

1 

_1 

7 

_1 

> 

* 

1 

7 (/ 37 ‘’) 9 (/ 3 ;) 
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c) Set = I probability 

^ ^ ^ with probability 1 - 


Step 4: Sampling of B\}f given the current state of 7 . 

(a) Generate B'\-y from the pseudo-prior 7i^{B'\-y) = 

(3\~^ and a/ 3 ^_^ are the respective MLEs and corresponding standard errors 
of B\-y from the full model given data y. 

(b) Set (3\l = P\-y with probability equal to 1. 


Step 5: Sampling of (3^'^ given the current state of y* and 6 . 

(a) Generate Bo from the proposal distribution q{Bo) = N(/3o, where Bo 

and BjSg are the respective MLE of Bo and the standard error of Bo from 
the null model given data y*. 

(b) Galculate the probability of move: 


My'lPoV'* ^(/3o‘ “) 


(c) Set Bo^ 


Bo with probability a^g, 
Bo~^^ with probability 1 — a^g. 


Step 6: Sampling of given the current state of /3^, Bo, 7 and 6. 

(a) Generate y*' from a proposal distribution ^(y*'); see remark below for 
details about this proposal. 


(b) Galculate the MLEs given y*^* and y*' and compu te t 

proximations and m^{y* |5) through Eq. 

(c) Galculate the probability of move: 


re Laplace ap- 


3.7 


foiy*'\Boy/^ m;(yfr^-^)|^) g(yfr*-b) 
Jo(y*h-l)|/3o)Vb mN(y*'|5) q(y*') 


(d) Set yfr*) 


y*' with probability ay*, 

y*(*-i) with probability 1 — Qfy*. 


Step 7: Sampling of 5*^*^ given the current state of B-y, Bo and 7 . 

(a) If 6 is fixed at n go to Step 1, else implement (b)-(e) of Step 7. 

(b) Generate 5' from the proposal distribution = Gamma(a, h) with 

a = /sj and b = 
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Suggested proposals for Step 6: For y* we recommend the following proposals de¬ 
pending on the likelihood of the model and on the PEP prior that is used: 

i) For logistic regression a product binomial proposal distribution given by 

n* 

g(y*) = TT Binomial(iVi, TT*) with tt* = . ,, -- — -, 

where ttq = (1 -|- exp(—TT-yp) = (1 -|- exp(—and iVj denotes the 
number of trials of the observed data. 

ii) For Poisson regression a product Poisson proposal distribution given by 

n* 

Qiy*) = 

i=l 

For the CR-PEP prior A* = AqA^'I^^, where Aq = exp(/3o) and A-^-p) = exp(X.yP)/3..^). 
For the DR-PEP prior we utilize a random-walk proposal, i.e. X* = Vi ■ 
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